Data Distillery: Effective Dimension Estimation via Penalized Probabilistic PCA

نویسندگان

  • Wei Q. Deng
  • Radu V. Craiu
چکیده

The paper tackles the unsupervised estimation of the effective dimension of a sample of dependent random vectors. The proposed method uses the principal components (PC) decomposition of sample covariance to establish a low-rank approximation that helps uncover the hidden structure. The number of PCs to be included in the decomposition is determined via a Probabilistic Principal Components Analysis (PPCA) embedded in a penalized profile likelihood criterion. The choice of penalty parameter is guided by a data-driven procedure that is justified via analytical derivations and extensive finite sample simulations. Application of the proposed penalized PPCA is illustrated with three gene expression datasets in which the number of cancer subtypes is estimated from all expression measurements. The analyses point towards hidden structures in the data, e.g. additional subgroups, that could be of scientific interest.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Penalized Bregman Divergence Estimation via Coordinate Descent

Variable selection via penalized estimation is appealing for dimension reduction. For penalized linear regression, Efron, et al. (2004) introduced the LARS algorithm. Recently, the coordinate descent (CD) algorithm was developed by Friedman, et al. (2007) for penalized linear regression and penalized logistic regression and was shown to gain computational superiority. This paper explores...

متن کامل

A Deflation Method for Structured Probabilistic PCA

Modern treatments of structured Principal Component Analysis often focus on the estimation of a single component under various assumptions or priors, such as sparsity and smoothness, and then the procedure is extended to multiple components by sequential estimation interleaved with deflation. While prior work has highlighted the importance of proper deflation for ensuring the quality of the est...

متن کامل

Probabilistic PCA for t distributions

Principal component analysis (PCA) is a popular technique for dimension reduction. Since the scope of its application is limited by its global linearity, several generalizations are proposed in the literature, among which the probabilistic PCA, introduced recently by Tipping and Bishop, is a particularly important one. Based on a probabilistic model, these authors obtained a PCA type projection...

متن کامل

Bayesian Inference on Principal Component Analysis Using Reversible Jump Markov Chain Monte Carlo

Based on the probabilistic reformulation of principal component analysis (PCA), we consider the problem of determining the number of principal components as a model selection problem. We present a hierarchical model for probabilistic PCA and construct a Bayesian inference method for this model using reversible jump Markov chain Monte Carlo (MCMC). By regarding each principal component as a poin...

متن کامل

Predictive model building for microarray data using generalized partial least squares model

Microarray technology enables simultaneously monitoring the expression of hundreds of thousands of genes in an entire genome. This results in the microarray data with the number of genes p far exceeding the number of samples n. Traditional statistical methods do not work well when n p. Dimension reduction methods are often required before applying standard statistical methods, popular among the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018